if Tages_der_blodproeve_i_forbindel in: ('Ja') then do;
if Er_patienten_fastende_=:'Ja' or NewFastende=:'Ja' then Faste='Ja';
else if Er_patienten_fastende_=:'Nej' or NewFastende=:'Nej' then Faste='Nej';
end;
else if Tages_der_blodproeve_i_forbindel in: ('Nej') then do;
if NewFastende=:'Ja' then Faste='Ja';
else if NewFastende=:'Nej' then Faste='Nej';
end;
else if Tages_der_blodproeve_i_forbindel in (' ') then do;
if Er_patienten_fastende_=:'Ja' or NewFastende=:'Ja' then Faste='Ja';
else if Er_patienten_fastende_=:'Nej' or NewFastende=:'Nej' then Faste='Nej';
end;DD2 biobank
Primary data derived from DD2 research studies
Go to Data documentation
When individuals are enrolled in DD2, blood and urine samples are collected and stored in the biobank in Vejle. The samples themselves are considered “primary DD2 enrollment data”, and they are all collected at DD2 baseline, i.e., the time of the DD2 enrollment defined by the variable reg_dato. No automatic or standard analyses are conducted, but DD2 research projects can have the samples analyzed if (additional) analyses are needed. Irrespective of when the analyses are performed, the timing for the analysis results will always be baseline DD2, because that was the time the blood/urine sample was collected. The results from analyses of the blood and urine samples are considered “Additional DD2 data”.
For further information about the initial idea about the biobank, please see Christensen et al. (2012).
The documentation for the biobank can be downloaded here (Danish, downloaded 12 January 2024):
Data sources
The data in the biobank are joined from multiple studies and datasets. The individual identifier in the biobank is the ProjektID and it is unique per CPR number. In general, the variable ProjektID ends with the digits -00. Another variable is Barcode which is similar to ProjektID, but it end with a number (e.g. -12, -19, or -99) and denotes the specific sample for the individual (see the above links for biobank documentation).
Locally at Department of Clinical Epidemiology (DCE), the raw data files are stored in the folder:
O:\HE_KEA-DATA-RAW0050\DD2 data\Main Part - Local DD2 database\data\Input Data Sets\BioBank
Data files from the 2022/2023 data updates are stored in the folder:
O:\HE_KEA-DATA-RAW0050\DD2 data\Biomarkører
Below is a list of the biobank data files stored at DCE (files read in locally). File names might have changed since DCE initially received them.
First biomarkers
In the early phase of DD2, many biomarkers were analysed for the first 1,053 individuals enrolled in DD2 (some individuals have later withdrawn consent and are no longer in the DD2 cohort). These individuals are marked by the variable BlodProve1053patients (based on the CPR numbers in data files First1053Patients and UrinResultatermedRatio2013Nov) and are predominantly enrolled in 2011-2012.
First1053Patients.txt: The file appears to have been received in November 2012. It includes N=1,053 CPR numbers and these individuals are most likely everyone in the database at that time. The file includes results from the initial blood samples. The dataset lacks date information and also unit specifications for the variables hæmolyse, icteri, and lipæmi. The dataset includes the variables:- CPR
- ProjektID
- C-peptid (N=1,053, pmol/L)
- GAD (N=1,053, kU/L)
- Glucose (N=1,050, mmol/L)
- ALAT (N=1,030, U/L)
- Hæmolyse (N=1,030)
- Icteri (N=1,030)
- Lipæmi (N=1,030)
- AMYLP (N=1,041, U/L)
- CRP (N=1,041, mg/L)
It is recommended not to use c-peptide measurements from this file, since the samples are analysed using an old type of analysis kit (see below).
Information about some of the variables can be found in Mor et al. (2014).
UrinResultatermedRatio2013Nov.txt: DCE likely received the data file in December 2013. It includes N=1,053 CPR numbers (same as inFirst1053Patients) with results from the initial urine samples. The dataset includes the variables:- CPR
- Barkode (ends with -19)
- ALBu_mg_L_ (N=1,041, mg/L, dated November 19, 20, and 23, 2012)
- KREA_mmol_L_ (N=1,041, mmol/L, dated November 19, 20, and 23, 2012)
- UPROT_g_L_ (N=1,041, g/L, dated November 19, 20, and 23, 2012)
- Albumin_Kreatinin_ratio (N=1,041)
- Kommentar (N=473, <5 with “PROT-U > 2.5 g/l” and the rest with “ALB-U <3 mg/L”)
Additional biomarkers
DataTilDD2medFastende.txt: DCE probably received the file in September 2015. It includes N=5,996 CPR numbers with character results on GAD, glucose, and c-peptide, along with a variable about faste. The majority of the individuals in the file are enrolled before 2015, but it is only around 80-85% of all the individuals enrolled before 2016. There are no dates in the data. The dataset includes the variables:- Id (ProjectID, ends with -00)
- GAD (N=131 numeric values, N=5,797 with the value “<0,000”, and N=51 with “>525,000”)
- Glucose (N=4,457)
- Cpeptid (N=5,964)
- Fastende (“Ja” for N=2,891, “Nej” for N=397, and “Ved ikke” for the remaining N=2,708) (see below for more information about fasting blood samples)
The majority of the initial 1,053 individuals are also included in this data file. It was initially recommended to use the c-peptide measurements in this data file and not the original data file, First1053Patients.txt. Later, in 2023, DCE received new data with fully updated c-peptide and glucose values (see below).
WrongCPeptideMeasurements.txt: The file appears to have been received in November 2015. It includes N=105 ID numbers (end with -00) and the variables cpeptid and NyCpeptid. The analysis was performed to compare values from different analysis kits, and the values were quite different. It is recommended not to use any of the measurements from this file.
MBL data
DCE received variables and data regarding mannose-binding lectin (MBL) in September 2017 as part of the PhD project Gedebjerg (2020). See also DD2 project description.
DD2 resultater.xlsx(andDD2 resultater_10_0095, which is the version where <10 is replaced by 10 and <0,095 by 0,095): DCE received the file in September 2017. It includes N=7,519 barcodes (end with -99) in sheets of 100 or 101 rows, and should include CRP and MBL on everyone enrolled by December 2016. See Gedebjerg et al. (2023) for more information. The dataset includes the variables:- barkode (ends with -99)
- CRP (N=7,510, mg/L)
- MBL (N=7,514, µg/L)
DCE was told to keep the CRP measurements from the original data file (First1053Patients.txt) and this file separate. The unit for CRP is mg/L in both data files.
Resultater den 250917 Anne Gedebjerg.xlsx: DCE received the file in September 2017. It includes N=3,116 barcodes (end with -99) and variables regarding MBL expression genotyping (six SNPs in the MBL2 gene). The genotyping was done for the first ~3,000 individuals enrolled in DD2. See Gedebjerg et al. (2020) for more information. The dataset includes the variables:- barkode (ends with -99)
- HL
- XY
- PQ
- 52
- 54
- 57
- HAPLOTYPE
April 2022 data
During 2022-2023 DCE received additional data on CRP, c-peptide, and glucose. A file with CRP was received in October 2022, but it is fully included in a file from January 2023 which also includes c-peptide and glucose. The October file is therefore not used during uploads, whereas the January 2023 file has been uploaded to the servers.
DD2_cRP_Glucose_Cpep_2022_resultater (1).xlsx: The data file includes N=3,399 ProjektIDs. They are all enrolled after the first N=1,053 individuals, but it is not all individuals per year. We don’t know exactly why they were analysed, but it might have been part of the IDA study. The dataset includes the variables: projekt_id, Cpeptid_Barkode, Cpeptid_Resultat, Cpeptid_Måleenhed, Cpeptid_Antal_decimaler, Cpeptid_Dato, Cpeptid_Notat, CRPHS_Barkode, CRPHS_Resultat, CRPHS_Måleenhed, CRPHS_Antal_decimaler, CRPHS_Dato, CRPHS_Notat, Glukose_Barkode, Glukose_Resultat, Glukose_Måleenhed, Glukose_Antal_decimaler, Glukose_Dato, Glukose_Notat.- projekt_id (N=3,339)
- c-peptid (N=2,933, pmol/L, dated 30APR2022 or 01MAY2022)
- CRP (N=2,478, mg/L, dated 30APR2022 or 01MAY2022)
- Glucose (N=3,055, mmol/L, dated 02APR2022 or 03APR2022)
C-peptide and glucose
During the summer 2023 DCE received data on c-peptide (July 2023) and glucose (August 2023). These files include all cleaned c-peptide and glucose measurements from the biobank, and results from these files will thus replace all the other measurements from earlier datasets. We now have the files:
dd2_all_C_peptide_14July2023.xls: Includes N=9,762 ProjektIDs with data on c-peptide. The file also includes information about analysis date, sampling date, freezing date, unit, and kit. The following data management has been done by DD2 before DCE received the file:
- Alle 1254 observationer, som havde"gamle" målinger (før 28. feb 2015) er erstattet med opdaterede målinger med nyt kit/assay (efter 28. feb 2015).
- 6 observationer, som kun havde en "gammel" måling og INGEN opdateret genmåling er slettet
- Der er renset op i data, så hvert individ kun fremgår med én måling, som er analyseret vha. nyt kit/assay
Please note that the unit for c-peptide has changed from pmol/L to mmol/L.
DD2_glukoser_2023_08_23.xlsx: The data file includes N=9,563 projekt_id and information about glucose, units, and dates.
HOMA
HOMA values are calculated based on c-peptide and glucose. We use the Oxford calculator which can be downloaded from the website: https://www.dtu.ox.ac.uk/homacalculator/download.php. Since summer 2024, access to the HOMA calculator requires a licence (should be free of charge for “academic researchers”).
HOMA values in the data are calculated based on the c-peptide and glucose measurements received during the summer 2023. HOMA has only been estimated for glucose values in the interval 3.0-25 and c-peptide 0.2-3.5 (because of an updated Oxford HOMA calculator where values out of range cause problems in the excel calculator). Some individuals might therefore have glucose and c-peptide values but no HOMA values in the new data. HOMA values are estimated regardless og fasting status.
“Pladebiomarkører”
During 2022-2023, DCE received new data from additional biomarker analyses. Because of the way the analyses were performed, the new biomarkers are referred to as “pladebiomarkører”, as opposed to the previous ones which are called “målebiomarkører”. In practice, everyone enrolled as of the day the blood samples were taken from the biobank were included in the analysis. This was in the beginning of 2022, and include approximately the first 9,200 individuals. A small number of individuals have multiple measurements for specific biomarkers, most likely due to sample dilution during the analysis process. Data are in long format and include a total of 22 different biomarkers, all with unit pg/ml. The 22 biomarkers are listed here, and the dates refer to when DCE received the data files:
- TNF-a (April 2022, N=9,202)
- IL-6 (April 2022, N=9,195)
- Ang-Like4 (November 2022, N=9,200)
- FGF-21 (November 2022, N=9,200)
- FGF-23 (November 2022, Hu FGF-23, N=9,200)
- IL1-RA (November 2022, N=9,200)
- Leptin (November 2022, N=9,196)
- RAGE (November 2022, soluble, N=9,200)
- Sclerostin (November 2022, N=9,200)
- U-PAR (November 2022, N=9,200)
- Osteocalcin-1 (February 2023, N=9,203)
- CD163 (April 2023, N=9,047)
- Galectin-3 (April 2023, N=9,008)
- GDF-15 (April 2023, N=9,046)
- NT-proBNP (April 2023, N=9,048)
- Resistin (April 2023, N=9,046)
- Serpin (April 2023, N=9,047)
- YKL-40 (April 2023, N=9,045)
- Osteopontin (June 2023, N=9,204)
- Adiponectin (July 2023, N=9,204)
- Follistatin (July 2023, N=9,204)
- MPO (July 2023, N=9,204)
An overview of the biomarkers (table from the grant application) can be found here:
An additional document combining overview sheets, method descriptions, and quality logs from some of the analysis rounds can be downloaded here:
The data files with “pladebiomarkører” and “målebiomarkører” are not using the same format (e.g., long vs. wide format) and are therefore not combined.
Data files (pladebiomarkører)
This section is an overview of the data files DCE received.
April 2022,
220211 Vplex_final.xlsxwith 2 sheets (data and background information). Data were received in April 2022 but the analysis was probably performed in February 2022 based on the date stamps in file names. The data file includes data from N=9,294 individuals on IL-6 and TNF-a. The first sheet,Vplex sample results_final, includes the variables: Sample (id, ends with -12), Sample_Group (=Sample in all rows), Assay (either TNF-a or IL-6), Calc__Conc__Mean (results), RANGE (value 1 or 2), Plate_Name (each Plate_Name is used 78 times).- TNF-a (N=9,202, pg/ml)
- IL-6 (N=9,195, pg/ml)
The second sheet,
Vplex complete final, includes background information (rådata) about the sample from the sample_groups Sample (N=9,024), Standards (N=1,888), and Internal Control (N=236). The sheet includes the variables Plate_Name, Sample_Group, Sample, Assay, Well, Signal, Mean, CV Calc__Concentration, Calc__Conc__Mean, Calc__Conc__CV, __Recovery, __Recovery_Mean, Detection_Limits__Calc__Low, Detection_Limits__Calc__High, Detection_Range, Detection_Range_yesno, Quantification_range, Quantification_range_yesno, RANGE.DCE also received the data file
DD2 quality log_panel 1 edited_TNF IL6.xlsxbut is has not been used.November 2022,
8plex data final.xlsx, with 9 sheets (overview + 8 biomarkers). DCE received the data file in November 2022. There are no date stamps indicating when the analyses were performed. The data file include information on N=9,204 individuals (based on sample ID ending with -12). For each assay, the data file includes the variables sample (ID), assay, calc__conc__mean (result), RANGE, and plate_name. DCE was informed that the unit is pg/ml for all assays.- Ang-Like4 (N=9,200, pg/ml)
- FGF-21 (N=9,200, pg/ml)
- FGF-23 (Hu FGF-23, N=9,200, pg/ml)
- IL1-RA (N=9,200, pg/ml)
- Leptin (N=9,196, pg/ml)
- RAGE (soluble, N=9,200, pg/ml)
- Sclerostin (N=9,200, pg/ml)
- U-PAR (N=9,200, pg/ml)
DCE also received data files
eight biomarkers with CPR.xlsxandseven biomarkers with CPR.xlsbut these have not been used.February 2023,
DD2 osteocalcin final.xlsxwith 3 sheets (overview, results, and rådata). DCE received the file in February 2023, but there is no indication of when the analyses were performed. The file includes N=9,204 individuals (sample, ends with -12). The data file includes the variables sample (ID), assay, calc__conc__mean (result), RANGE, and plate_name.- Osteocalcin-1 (N=9,203, pg/ml)
The sheet rådata includes detailed information about each of the plates.
April 2023,
DD2_7plex_data_final.xlsxwith 9 sheets (overview + 7 biomarkers + additional sheet with sample names). DCE received the data file in April 2023. There are no date stamps indicating when the analyses were performed. The data file includes information on N=9,048 individuals (based on sample ID ending with -12). For each assay, the data file includes the variables sample (ID), assay, calc_conc_mean (result), RANGE, and plate_name. DCE was informed that the unit is pg/ml for all assays.- CD163 (N=9,047, pg/ml)
- Galectin-3 (N=9,008, pg/ml)
- GDF-15 (N=9,046, pg/ml)
- NT-proBNP (N=9,048, pg/ml)
- Resistin (N=9,046, pg/ml)
- Serpin (N=9,047, pg/ml)
- YKL-40 (N=9,045, pg/ml)
June 2023,
Osteopontin data final (1).xlsxwith 3 sheets (overview + data + rådata). DCE received the file in June 2023, but there is no indication of when the analyses were performed. It includes N=9,207 ID numbers (end with -12, plus a note that it means “EDTA plasma fraction”) with data on osteopontin.- Osteopontin (N=9,204, pg/ml)
July 2023 (1),
DD2_adiponectin_blue panel_final (1).xlsxwith 3 sheets (overview + data + rådata). DCE received the file in July 2023, but there is no indication of when the analyses were performed. It includes N=9,204 ID numbers (sample, end with -12) with data on adiponectin.- Adiponectin (N=9,204, pg/ml)
The sheet rådata includes detailed information about each of the plates.
July 2023 (2),
DD2 red panel_final.xlsxwith 4 sheets (overview + data (Follistatin + MPO) + rådata). DCE received the file in July 2023, but there is no indication of when the analyses were performed. It includes N=9,204 ID numbers (sample, end with -12) with data on follistatin and MPO- Follistatin (N=9,204, pg/ml)
- MPO (N=9,204, pg/ml)
The sheet rådata includes detailed information about each of the plates.
Fasting
Was the individual fasting when the blood sample was drawn? A simple question, yet, difficult to assess.
Upon enrollment, the individuals are informed to be fasting: no food/liquid (except water) from 10.00 o’clock the night before. Also, while fasting, the patient should not take any glucose-regulating drugs. Data from the DD2 questionnaire itself include the variable Er_patienten_fastende_ (whether the patient is fasting). Currently, around 75% of the individuals have answered that they are fasting. This variable can be used on its own, but should probably be combined with the variable Tages_der_blodproeve_i_forbindel (whether the blood sample was taken at the same time as the questionnaire was answered). If the fasting patients are restricted to include only those whose blood sample was drawn at the same time as the questionnaire was answered, then around 72% of the individuals are defined to be fasting.
The data file DataTilDD2medFastende.txt received in September 2015 include information on fasting state for N=5,996 individuals (“Ja” for N=2,891, “Nej” for N=397, and “Ved ikke” for N=2,708 individuals). This file will not be updated and we therefore don’t get new information on this fasting variable. We don’t know how this variable was defined, but it is probably based on information on the blood sample itself. By adding the information from this file (variable name NewFastende) to the variable Er_patienten_fastende_, an additional 447 individuals can be defined as fasting (410 with missing information in Er_patienten_fastende_ and 37 who replied not to be fasting in Er_patienten_fastende_).
Currently, the fasting state has been defined by the following (SAS) algorithm combining all the files and stating that the individual is fasting if we have any indication that this could be the case (macro: AdditionalVars_Faste):
Note: DD2 plan to make an “official” definition of Faste.
Data documentation
biobank.sas7bdat
| Format (var x obs) | Id variables | Unique key | Important dates |
|---|---|---|---|
| Wide (48 x 11,381) | CPR, ProjektID | CPR (ProjektID) | VejleDato |
All datafiles except the 22 variables from the “pladebiomarkører” are combined and included in the biobank dataset. A few variables from dd2core (e.g. reg_dato and Er_patienten_fastende_) are also included in the biobank dataset.
Data include analysis results from successful analyses. Not all analyses are performed for all individuals (missing data), and we don’t have more information about specific analyses (i.e., project, analysis method, unit/kit, non-successful analyses etc.). In some versions of the dataset, rows are included for all CPR numbers in the population, even if no analysis results are available.
| Row | CPR | ProjektID | Analysis1 | Analysis2 | Analysis3 | … |
|---|---|---|---|---|---|---|
| 1 | CPR1 | ProjektID1 | num. | num. | … | |
| 2 | CPR2 | ProjektID2 | num. | num. | … | |
| 3 | CPR3 | ProjektID3 | … | |||
| 4 | CPR4 | ProjektID4 | num. | num. | num. | … |
| … | … | … | … | … | … | … |
| 12,098 | CPR12098 | ProjektID12098 | num. | num. | num. | … |
biomark.sas7bdat
| Format (var x obs) | Id variables | Unique key | Important dates |
|---|---|---|---|
| Long (9 x 201,243) | CPR, ProjektID, ydernr | CPR*Assay | (Vejledato) |
The biomarkdataset include analysis results from the 22 “pladebiomarkører”. No dates are included in the dataset, however, analyses are performed on the enrollment blood sample. The dataset include approximately 9,200*22=202,400 rows. In principle, CPR*Assay should be the unique key, however, some analyses are performed multiple times per individual (due to dilution in the analysis).
| Row | CPR | ProjektID | Assay | Value | Info | … |
|---|---|---|---|---|---|---|
| 1 | CPR1 | ProjektID1 | TNF-a | num. | … | … |
| 2 | CPR1 | ProjektID1 | IL-6 | num. | … | … |
| 3 | CPR1 | ProjektID1 | Ang-Like4 | num. | … | … |
| … | … | … | … | … | … | … |
| 22 | CPR1 | ProjektID1 | MPO | num. | … | … |
| 23 | CPR2 | ProjektID2 | TNF-a | num. | … | … |
| 24 | CPR2 | ProjektID2 | IL-6 | num. | … | … |
| … | … | … | … | … | … | … |